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EXTENDED ABSTRACTf 


1. ABSTRACT 

A dynamic programming based approach for data compression of a ID sequence is pre- 
sented. The compression of an input sequence of size A r to that of a smaller size k is achieved 
by dividing the input sequence into k subsequences and replacing the subsequences by their 
respective average values. The partitioning of the input sequence is carried with the intention 
of reducing the mean squared error in the reconstructed sequence. The complexity involved in 
finding the partitions which would result in such an optimal compressed sequence is reduced 
by using the dynamic programming approach, which is presented. 

2 . INTRODUCTION 

Problem Definition. The problem is defined as follows. Given a ID input sequence 
of size A, and a compression parameter k , we need to divide the input sequence into k sub- 
sequences and replace each of the subsequence by its average value. The goal is to find an 
optimal partitioning that minimizes the error due to this approximation process. If the input 
sequence is partitioned at points n, = no, n\ , • • • , n*, with no = 1 and rik = A f , the net error 
due to approximation is 
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where /, is the length of the subsequence i. The parameter /,, and the parameters resulting 
due to approximation, namely variance, <7,, and mean, are given by 
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Motivation Image data compression involves minimization the number of bits required 
to represent an image. Data compression is used extensively in applications involving data 
transmission and data storage. Data transmission applications include remote sensing via 
satellite, radar and sonar, teleconferencing, computer communications, facsimile transmission. 
Image storage is required for medical images, magnetic resonance imaging, digital radiology, 
weather maps, geological surveys, finger print storage etc. 

Data compression methods can be broadly divided into two categories, called lossless and 
lossy compression techniques. From an algorithmic stand point, they are classified into: 1) 
spatial domain methods; 2) transform domain methods; 3) statistical methods; and 4) hybrid 
methods. There exists another classification, in information theory, where these algorithms are 
divided as: channel based methods, and source based methods. 

The approach described in this paper follows a lossy compression strategy. We solve the 
problem by mapping the input sequence onto the leaves i.e., , external nodes, of a complete 
binary tree, as shown in Figure 1. We adopt a compact binary tree representation. The internal 
nodes of the binary tree have a value equal to the mean of that of their respective children. A 
closer look reveals that the problem at hand is equivalent of one of finding a cutset of size k in 
the binary tree, as shown in Figure 2. The complexity involved in such an effort is reduced by 
using the proposed dynamic programming approach. 

3. PRELIMINARY CONCEPTS 


For an input sequence of size A, the height, h, of the binary tree will be equal to log 2 N . 
Assuming that the node j is at level d , the mean and variance of the / = 2 h ~ d ~ 1 leaf nodes 
below it, will be equal to 
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Each internal node j in the tree stores the mean value /i j and a cost function C j , where 
Cj = 7 i j cr j , < 7 j being the variance of the leaves that are the descendants of j . The cost function 
would thus represent the net error that would be introduced if the values of these leaves were 
replaced by //j. The cost function, at j can be recursively obtained from the mean and cost 
functions of its two immediate children using the relationship 


Vj =- 


l i 2j + M2j + 1 


n } o] =n 2j crl J + n 2 j + i(T 2 j+i + - V 2 j+i) 




4 


Given the compression parameter fc, the problem of data compression is one of finding a 
node cutset, which is a set of k nodes, that seperates the root from all its leaves. For example 
the dashed line in Figure 2 constitutes a cutset of size k = 11. Thus, the problem of finding a 
sequence of length k is equivalent to one of finding a cutset, containing k elements in the binary 
tree. The input signal can be reconstructed, with some error, from the cutset by propogating 
the mean value stored at each node in the cutset to the leaves that are its descendants in the 
binary tree, as shown in the Figure 2. Also, the total mean square error in the reconstructed 
sequence can be computed by simply adding the cost functions stored at each node in the 
cutset. 


Thus, the problem of finding an optimal compressed sequence becomes one of finding a 
cutset in the compression tree from which if we reconstruct the signal of length TV, it would be 
best represent the original signal. A cutset, containing k elements , would contain k— 1 internal 
nodes in the binary tree . Thus one of the approaches towards finding an optimal solution would 
be to construct all the binary trees containing 2k — 1 nodes , with the limitation that the height 
of the tree does not exceed log^2k — 1, and find the tree that would best reproduce the original 
signal. Without the limitation the problem has an exponential complexity [l]. Thus, this 
direct approach toward solving the problem is not feasible. However, the proposed dynamic 
programming approach solves the problem with a complexity of 0(n 2 logn). 

4. DYNAMIC PROGRAMMING SOLUTION 


The problem of data compression has two key ingredients that make a dynamic program- 
ming solution feasible: optimal substructure and overlapping subproblems [l]. 


A problem is said to have an optimal substructure when it obeys the Principle of Opti- 
mality [2]. The principle states that an optimal sequence of decisions has the property that 
whatever the initial state and decision are, the remaining decisions must constitute an opti- 
mal decision sequence with regard to the state resulting from the first decision. To explain 
the suitability of applying the dynamic programming technique it is necessary to describe the 
process of finding the node cutset. 
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An example illustrating the building of the 
binary compression tree for an input sequence of 
size 16 



The figure illustrates the reconstruction of the 
sequence for the given cutset 


Figure 2. The figures illustrate the compression and the reconstruction of the input sequence 
of size 16. A cutset containing k elements would give us a compressed sequence of size k. The 
figure on the right illustrates the reconstruction of the sequence of size 16 by propogation of 
the mean at each node of the cutset to its leaves. 


A cutset CS, having k nodes can be viewed as being a result of the concatenation of two 
cutsets, LCS and RCS, that belong to the left and right subtrees of the root of the binary 
tree, respectively. If the number of nodes in LCS is p then RCS would contain k — p nodes. 
Thus the cost of transmitting CS is equal to the sum of costs of transmitting LCS and RCS. 
We thus need to find a p, 1 < p < N that would give us a CS of minimum cost. The key to 
obtaining an optimal solution is that given a value of p, LCS and RCS should by themselves 
be the results of an optimal merger. If this were not obeyed it would be possible to find a LCS 
and/or RCS that would be of a lesser cost and hence would result in a final cutset having a 
lower cost. Thus, the optimal solution to the problem is a result of optimal solutions for the 
subproblems, which indicates that the problem exhibits an optimal substructure. 

To find the optimal solution we modify our representation of the cost function slightly. 
The cost function, represented by C(j,k), would represent the total error that would result if 
we were to include a cutset of size k , from a subtree with root at node j , in the final cutset. 
Note that C(j, 1) would represent the total error if the mean value stored at node j is the 
cutset. C(j, 1) can be computed by using the formula. 


C(j, 1) = C(2j, 1) + C(2j + 1,1)+ ^(p 2 j - M2,+i) 
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Thus, the total error that would result if we transmit a sequence of size k for an input 
sequence of size n will be C(0,k). Equivalently, the optimal solution to the compression 
problem can be obtained if we define C(j, k) as 


C(j,k) 


kj 

mini< p <„ C(2j,p) + C(2j + l,k — p) 


for k = 1 1 
for k > 1 J 
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One method of solving the above problem would be to adopt divide and conquer approach 
and solve the problem top-down. But the problem of data compression has a “small” space 
of subproblems and is of a polynomial input size. A divide and conquer approach solves the 
same subproblems over and over again thus increasing the complexity. 



Figure 3. The figure gives am example of a situation where a recursive solution to the 
data compression problem does not take advantage of the optimal substructure inherent in the 
problem. 


The complexity involved in the top-down approach of solving the problem can be reduced 
by using a dynamic programming approach. While the recursive algorithm solved the subprob- 
lems a number of times, the dynamic programming approach solves each of them only once. 
The dynamic programming approach obtains an optimal solution to the problem by using a 
bottom-up approach, where the cost of the optimal cutsets of a subtree are obtained from those 
of its subtrees. The optimal solution to the various subproblems are stored in a tabular form 
and is looked up whenever necessary, instead of being solved all over again. 
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Figure 4 . The figures indicate that both, CS\ and CS 2, have the same subproblem, in- 
dicated within the dotted box. A recursive solution would solve the subproblem during the 
evaluation of both C S\ and OS2. 


The table for an input sequence of size eight is shown in Figure 5. As is evident from the 
figure, the table contains a block for each node in the compression tree. Thus the number of 
blocks at any level in the table is equal to the number of nodes in the tree at the corresponding 
level. The number of entries in each block of the table is equal to the maximum number of 
resources that can be allocated to a subtree with its root at the corresponding node in the tree. 
The entries in a block j store the value of the cost, C(j, &), 1 < k < maximumresources , of 
all the optimal cutsets. The values of C(j, A;) is found by using the formula (1). Note that the 
values of C(2j, *) in (1) has already been computed and stored in the table and its value can 
be obtained by looking up the corresponding entry in the table. The complexity of solving the 
problem using this technique is 0{n 2 logn). 


5. EXPERIMENTAL RESULTS 

The dynamic programming technique was implemented and was tested on a number of 
images after representing the image as a ID sequence. The results of applying the technique 
on a human face and on an oceanographic image have been presented in Figures 6 and 7 
respectively. It is seen that the technique reproduces the sharp edges very well. It is also 
seen that the technique produces very good results on the oceanographic image as the image 
is largely flat. 


6. CONCLUSION 
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Figure 5. Table for storing the C(j, k ) for an input sequence containing eight el- 
ements. The number above each box indicates the corresponding node number in the 
compression tree. Note that the number of entries in each box is equal to the maximum 
number of nodes that can be allocated to each node. 


The proposed method has been tried on several images. The experimental results indicate 
the compression is efficient in reproducing sharp edges. This method will be equivalent to 
wavelet in performance. Our method divides the ID sequence of numbers into several runs 
whose length is restricted to powers of two. This enables us to solve the identification of 
optimal cutset in a tractable way. An arbitrary run-length coding is NP. Further results and 
comparative studies will be presented in the final paper. 
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Figure 6. Results of applying the compression technique on a human face. 
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Figure 7 . Results of applying the compression technique on an oceanographic image. 



